Picture for Mengjiao Wang

Mengjiao Wang

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

REFA: Real-time Egocentric Facial Animations for Virtual Reality

Add code
Jan 07, 2026
Viaarxiv icon

A Novel Discrete Memristor-Coupled Heterogeneous Dual-Neuron Model and Its Application in Multi-Scenario Image Encryption

Add code
May 30, 2025
Viaarxiv icon

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild

Add code
Apr 15, 2025
Figure 1 for PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Figure 2 for PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Figure 3 for PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Figure 4 for PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Viaarxiv icon

FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution

Add code
Apr 13, 2025
Viaarxiv icon

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality

Add code
May 23, 2023
Figure 1 for Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Figure 2 for Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Figure 3 for Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Figure 4 for Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Viaarxiv icon

Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace

Add code
Feb 21, 2023
Viaarxiv icon

FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning

Add code
Oct 26, 2022
Figure 1 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Figure 2 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Figure 3 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Figure 4 for FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Viaarxiv icon

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

Add code
Mar 10, 2022
Figure 1 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Figure 2 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Figure 3 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Figure 4 for LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Viaarxiv icon

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Add code
Mar 01, 2022
Figure 1 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Figure 2 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Figure 3 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Figure 4 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Viaarxiv icon